Understanding Chinese Spontaneous Speech - Are Mandarin and Cantonese Very Different?
نویسنده
چکیده
This paper presents a study of the similarity between Cantonese and Mandarin spoken and written texts. Spontaneous speech in Cantonese consists of colloquial and filler phrases but it’s keywords similar to Mandarin. We use a statistical tool to extract Cantonese phrases from a spontaneous speech database. We collected using a Wizard-of-Oz setup. More fillers are collected from written Cantonese downloaded from online newsgroups. We quantify the similarity between Cantonese and Mandarin texts by using Zipf's Law of Language Distance [4] and using R regression scores. The scores show that Taiwan and Chinese newsgroup articles are more similar to each other (with 0.83 R regression score) than with Hong Kong articles (0.79 and 0.67), underlining the language difference between Cantonese and Mandarin.
منابع مشابه
Multi-accent Chinese speech recognition
Multiple accents are often present in spontaneous Chinese Mandarin speech as most Chinese have learned Mandarin as a second language. We propose a method to handle multiple accents as well as standard speech in a speaker-independent system by merging auxiliary accent decision trees with standard trees and reconstruct the acoustic model. In our proposed method, tree structures and shape are modi...
متن کاملUnsupervised Learning of a Chinese Spontaneous and Colloquial Speech Lexicon with Content and Filler Phrase Classification
There is significant lexical difference—words and usage of words-between spontaneous/colloquial language and the written language. This difference affects the performance of spoken language recognition systems that use statistical language models or context-free-grammars because these models are based on the written language rather than the spoken form. There are many filler phrases and colloqu...
متن کاملUsing English Phoneme Models for Chinese Speech Recognition
To build a speech recognizer, database design, collection and transcription is the most time consuming and tedious job. This paper proposes some fast and easy methods to use English phoneme models for Mandarin and Cantonese speech recognition with little to no training data in Mandarin and Cantonese. While a recognizer built with such transformed models might not perform as ideally as one that ...
متن کاملTemporal and Tonal Aspects of Chinese Syllables: a Corpus-based Comparative Study of Mandarin and Cantonese
Previous studies on temporal and tonal aspects of languages are usually based on limited data from a small number of subjects. It is difficult to know whether these findings can really represent the general temporal and tonal aspects of continuous speech, or just the speech of the specific subjects involved. Because of this difficulty it may not be appropriate to directly apply these findings t...
متن کاملComparing native and non-native speech rhythm using acoustic rhythmic measures: Cantonese, Beijing Mandarin and English
This study investigates the speech rhythm of Cantonese, Beijing Mandarin, Cantonese-accented English and Mandarin accented English using acoustic rhythmic measures. They were compared with four languages in the BonnTempo corpus: German and English (stress-timed) and French and Italian (syllable-timed). Six Cantonese and six Beijing Mandarin native speakers were recorded reading the North Wind a...
متن کامل